This dataset is the gene expression profiling of MDA231, BT549 and SUM159PT celles after selumetinib treatment or DUSP4 siRNA knockdown. MDA231, BT549 and SUM159PT basal-like breast cancer cell lines were transfected with non-targeting siRNA (siCONTROL), siRNA targeting DUSP4 (siDUSP4), or siCONTROL + 4 or 24 hr of 1uM selumetinib. The data were log2 RMA normalized.
This dataset has 36 samples which can be separated to 3 groups by different cell lines (MDA231, SUM159PT, BT549). Each cell line has 12 samples which can be separated to 2 groups by different drug treatments (DMSO, Selumetinib). Each drug treatment has 3 controls and 3 cases.
Basal-like breast cancer (BLBC) is an disease that has less clinically approved targeted therapy. This research focus on the dual specificity phosphatase-4 (DUSP4) is a negative regulator of the activation of the mitogenactivated protein kinase (MAPK) pathway that is deficient in BLBCs treated with chemotherapy. This paper investigated how DUSP4 regulates the MAP-ERK kinase (MEK) and c-jun-NH2-kinase (JNK) pathways in modifying cancer stem cell like behavior. This research support the MEK and JNK pathways inhibitors are therapeutic agents in basal-like breast cancer to eliminate the cancer stem cell population.
The paper introduces several methods to collect the resources and manipulate the data. They used microarrays as a tool that cells were harvested 96 hours after setting up all the control and case samples. For the statistical analysis, linear regression, ANOVA and the Student t tests are used which can be found from the original paper. Student t test was used for two groups analysis, multiple group analysis was conducted by ANOVA with Tukeypost hoc analyses.
The dataset has total 36 sample with 3 different cell lines, MDA231, BT549 and SUM159PT. Each cell line has 2 different drug treatments which are siDUSP4 and selumetinib and contains 12 samples. Every 6 samples are in a group with 3 CONTROLS and 3 CASES. In the following statistics, dataset has two characteristics, one is separated by CONTROL and CASE, the other is separated by different cell lines. The linear regression model design was based on these two characteristics.
This is the Boxplot of the original data.
The boxplot shows the data after normalization, log2 ratio and cpm were used.
Boxplot from original data, blue color represents the CONTROL samples, and the purple color represents the CASE.
Plot of GSE41816 database by using quantile normalization.
QQ-Plot of GSE41816 database.
This is the density plot by using log2 ratio for the original data.
This the density plot after using cpm function.
MDS plot shows the relation between CONTROL and CASE. The relation between CONTROL and CASE are pretty close. The most differences are between differet cell types.
## [1] 947
## [1] 292
## [1] 613
## [1] 334
The glmQLFTest from edgeR package is based on the cell line MDA231. There are 947 genes pass the threshold p-value which less than 0.05. There are 292 genes pass correction, 613 genes are up regulated and 334 genes are down regulated.
## logFC logCPM F PValue FDR
## ENSG00000166825 -1.325501 6.598318 4881.474 1.953507e-10 3.594062e-06
## ENSG00000148677 1.290621 6.270545 3539.259 1.307444e-09 7.207435e-06
## ENSG00000164176 1.359398 6.437174 2132.666 1.394339e-09 7.207435e-06
## ENSG00000171345 1.312254 6.168467 3735.305 1.567004e-09 7.207435e-06
## ENSG00000138685 -1.354013 6.291498 2052.621 1.980686e-09 7.288132e-06
## ENSG00000178860 -1.321900 6.311231 1592.114 2.913912e-09 7.611416e-06
## X.GSM1024692. X.GSM1024693. X.GSM1024694. X.GSM1024695.
## ENSG00000166825 5.281563 4.937310 4.502223 4.299639
## ENSG00000148677 11.520560 11.367750 11.387870 12.188910
## ENSG00000164176 10.715520 10.619550 10.501440 9.942396
## ENSG00000171345 11.352290 11.467580 11.412640 10.544090
## ENSG00000138685 4.199064 3.843173 3.583468 4.350140
## ENSG00000178860 4.589334 3.993551 4.157411 4.164770
## X.GSM1024696. X.GSM1024697. X.GSM1024698. X.GSM1024699.
## ENSG00000166825 4.562559 4.559133 4.820214 5.217629
## ENSG00000148677 12.125530 12.191380 11.200870 11.279510
## ENSG00000164176 9.703603 9.920992 10.741190 10.393960
## ENSG00000171345 10.698020 10.632230 11.447330 10.844630
## ENSG00000138685 4.513221 3.639465 3.965435 5.034817
## ENSG00000178860 4.311838 3.681640 4.030405 4.623116
## X.GSM1024700. X.GSM1024701. X.GSM1024702. X.GSM1024703.
## ENSG00000166825 5.249080 4.782530 4.612532 5.054136
## ENSG00000148677 11.459280 11.416100 11.463730 11.433180
## ENSG00000164176 10.608210 11.461900 11.492240 11.427310
## ENSG00000171345 11.158700 11.523170 11.562280 11.389000
## ENSG00000138685 3.571180 3.754532 3.848157 4.121012
## ENSG00000178860 4.094188 3.564721 4.586854 4.581983
## X.GSM1024704. X.GSM1024705. X.GSM1024706. X.GSM1024707.
## ENSG00000166825 12.004110 11.962360 11.885780 12.016830
## ENSG00000148677 6.104874 6.141503 6.114605 5.725344
## ENSG00000164176 10.917870 10.884730 10.943350 10.563790
## ENSG00000171345 4.873714 4.409018 4.566517 4.259648
## ENSG00000138685 7.435131 7.418010 7.649821 8.264706
## ENSG00000178860 7.780212 7.622669 7.714523 7.107294
## X.GSM1024708. X.GSM1024709. X.GSM1024710. X.GSM1024711.
## ENSG00000166825 12.013260 11.926180 12.034530 12.062970
## ENSG00000148677 5.833907 5.701848 5.767415 5.554355
## ENSG00000164176 10.546890 10.606350 10.764500 10.889470
## ENSG00000171345 4.138494 4.003452 4.702843 4.733443
## ENSG00000138685 8.281728 8.080848 7.641626 7.689951
## ENSG00000178860 7.585727 7.459564 7.955678 8.045712
## X.GSM1024712. X.GSM1024713. X.GSM1024714. X.GSM1024715.
## ENSG00000166825 11.963400 12.059590 12.118400 12.061240
## ENSG00000148677 5.584886 5.720542 5.260229 5.900916
## ENSG00000164176 10.736000 10.973390 10.737780 10.859430
## ENSG00000171345 4.729569 4.536699 4.353832 4.975224
## ENSG00000138685 7.729908 7.897225 7.910603 7.991155
## ENSG00000178860 7.846854 8.678860 8.241144 8.573381
## X.GSM1024716. X.GSM1024717. X.GSM1024718. X.GSM1024719.
## ENSG00000166825 12.520290 12.405170 12.454030 12.388500
## ENSG00000148677 4.828815 4.317569 4.599101 4.942575
## ENSG00000164176 3.941980 4.238735 4.526144 4.284027
## ENSG00000171345 4.741997 4.374214 4.303652 4.394619
## ENSG00000138685 10.583470 10.537690 10.584110 10.824450
## ENSG00000178860 10.820920 11.010890 10.939960 9.956842
## X.GSM1024720. X.GSM1024721. X.GSM1024722. X.GSM1024723.
## ENSG00000166825 12.410740 12.389160 12.444240 12.439400
## ENSG00000148677 4.842864 4.845449 4.700742 4.661554
## ENSG00000164176 4.660891 4.195507 4.089976 3.712907
## ENSG00000171345 4.525424 4.203963 4.522422 4.564958
## ENSG00000138685 10.843160 10.848110 10.589870 10.467830
## ENSG00000178860 10.040360 10.090260 11.143030 11.195160
## X.GSM1024724. X.GSM1024725. X.GSM1024726. X.GSM1024727.
## ENSG00000166825 12.384330 12.250760 12.491760 12.457070
## ENSG00000148677 4.981931 4.183184 4.678524 5.022226
## ENSG00000164176 3.578350 3.727744 4.292710 4.114725
## ENSG00000171345 4.532353 4.249921 4.680730 4.616951
## ENSG00000138685 10.521600 10.465490 10.690850 10.609710
## ENSG00000178860 11.307790 11.036660 11.094830 11.062190
This is part of the table contents after quasi linear fit and calculating the p-values. These two tables return the top hits which ranked by p-values and the corresponding original data.
## [1] "ENSG00000117602" "ENSG00000175793" "ENSG00000162599" "ENSG00000184588"
## [5] "ENSG00000099260" "ENSG00000134247"
Extract differential expressed genes, there is a part of genes names showing above. These two plots are used to visualize the amount of differentially expressed genes.
The heatmap contains the top hits which p-value less than 0.05 differential expression genes that calculated by quasi-likelihood.
## ID logFC AveExpr t P.Value adj.P.Val
## 5781 ENSG00000187720 0.7370299 7.479091 8.640918 1.120608e-10 2.061695e-06
## 5581 ENSG00000080824 0.3500389 11.425553 6.896796 2.651832e-08 1.427723e-04
## 1239 ENSG00000157193 0.4629367 9.159892 6.874071 2.852303e-08 1.427723e-04
## 12614 ENSG00000144821 0.3827667 5.279393 6.796637 3.657034e-08 1.427723e-04
## 10740 ENSG00000172296 -0.4563264 8.084559 -6.735349 4.453032e-08 1.427723e-04
## 12928 ENSG00000145147 0.4142466 9.335957 6.721477 4.656125e-08 1.427723e-04
## B
## 5781 13.694997
## 5581 8.797634
## 1239 8.731628
## 12614 8.506416
## 10740 8.327834
## 12928 8.287376
## [1] 2670
## [1] 480
The table is the sample output for lmFit linear regression. There are 2670 genes pass the threshold which less than 0.05. 480 genes pass the correction.
The heatmap contains the top hits which p-value less than 0.05 differential expression genes that calculated by lmFit.
This plot is used to compare Quasi-likelihood model and limma model.
Fig. 1. Balko, J. M., Schwarz, L. J., Bhola, N. E., Kurupi, R., Owens, P., Miller, T. W., . Arteaga, C. L. (2013, October 15). Retrieved from Activation of MAPK pathways due to DUSP4 loss promotes cancer stem cell-like phenotypes in basal-like breast cancer.
The result p-values of ANOVA and a two-tailed Student t test are provided from the aboving picture. The p-value from picture B is the result of ANOVA. The p-value from picture E is the result of a two-tailed Student t test.
Fig. 5. Balko, J. M., Schwarz, L. J., Bhola, N. E., Kurupi, R., Owens, P., Miller, T. W., . Arteaga, C. L. (2013, October 15). Retrieved from Activation of MAPK pathways due to DUSP4 loss promotes cancer stem cell-like phenotypes in basal-like breast cancer.
Microarray analysis was conducted on RNA derived from MDA231, BT549 and SUM159PT cells with treatments siCONTROL or siDUSP4 and 4h or 24h of selumetinib. Picture A is the heatmap of significantly altered genes from MDA231 cells.
After having up-regulated and down-regulated set of genes, using g:Profiler to analysis those gene lists separately.
Benjamini-Hochberg FDR significance threshold was used. For the data sources, GO molecular function, cellular component and biological process, Reactome, WikiPathways and all the regulatory motifs in DNA, all the protein databases and human phenotype ontology were all selected. Reduced the result sample size from 1000 to 500.
GO:BP - bundle of His cell to Purkinjemyocyte communication GO:0086069
REAC - Platelet activation, signaling and aggregation REAC:R-HSA-76002
WP - Deregulation of Rab and Rab Effector Genes in Bladder Cancer WP:WP2291
GO:BP - skeletal system morephogenesis GO:0048705
REAC - Extracellular matrix organization REAC:R-HSA-1474244
WP - Tryptophan catabolism leading to NAD + production WP:WP4210
After comparing these two g:Profiler results, the down-regulated gene lists contains more information than the up-regulated gene list. Corresponding to the paper, DUSP4 expression downregulates expression which decrease the cancer stem cell population.